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The Graduate Record Examinations com it tees for 
French, Philosophy, and English Literature participated in an 
investigation of the feasibility of conducting validity studies of 
the 3RE using a common criterion task. Zt mas determined that such a 
study was not feasible. However some committee members suggested that 
many graduate departments used some type of ratings of graduate 
students; that rating scale criteria mould be generally acceptable to 
the various disciplines; and that it would be feasible to conduct 
studies using this type of criterion. Zt appeared from the 
investigation that a sufficient number of departments use a 
three -or- greater level rating procedure to warrant an attempt to 
conduct some preliminary validity studies using existing rating data 
as criteria. The probable variation between the rating scales 
currently in use in departments at different universities both in 
terms of attributes rated and type of scale quality, suggests that a 
uniform set of criterion rating scales should be developed prior to 
attempting to conduct validity studies using rating scales as 
criterion measures. (Author/SB) 
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; The Feasibility of Common Criterion Validity 

Studies of the GRE 1 

Introduction and Background 

The Research Committee of the Graduate Record Examinations Board has been 
concerned for some time with the paucity of validity daEa for the GRE. Although 
the number of validity studies has increased in recent years (Willingham, 1973) 
the amount of data is still best described as sparse. In the main, two problems 
have brought about this situation. The first is the small (for statistical pur- 
poses) number of students admitted to graduate study by a single department 
within a university in a given year or even over the period of two or three 
years. Ideally, at ^.east 100 students must be admitted within a one- or two- 
year period for a meaningful study to be conducted. 

The second problem looms even larger in the minds of many graduate deans. 
This is the criterion problem. Although grade-point average has long served 
as a natural and effective performance criterion at the college level, the same 
measure when viewed in the graduate context, appears if not inappropriate, cer- 
tainly inadequate. Other criteria which have been developed in an attempt to 
overcome some of the limitations of grades, such as global ratings or attain- 
ment of the doctorate, while offering some advantages, fail to reflect important 
aspects of performance in the graduate context. 

Over the years many GRE Committees of Examiners have expressed interest in 
having validity studies conducted for the examination for which they are 
responsible'. As might be expected, this concern is most often expressed in a 
rather general way and usually does not involve suggestions of specific criteria 
or procedures. Thus, preliminary discussions were held with several members of 
the ET3 test development staff and the consensus was u at with some intensive 



'This research was supported by the Graduate Record Examinations Board. 



-2- 

work it might be possible within several fields of study to develop a measurable 
criterion which would be generally acceptable to at least a large segment of 
:iat field. 

It was expected thaij in most, if not all, instances the criterion which 
would be developed would 'be one or more essay questions similar to those gener- 
ally used for final course examinations or comprehensive (qualifying, pre-lims, 
etc.) examinations. Once developed for a given field of study, the common set 
of questions would then be administered to students at the appropriate level at 
several different departments. A similar method has been used in the law school 
context (Klein & Hart, 1968) and has been referred to as the "common criterion" 
approach. 

It was expected that the cooperation of the appropriate department at each 

j 

university could be' secured by members of the Committee. The essays could then 
be graded by a group of professors from the participating departments and these 
grades used as a measure of sucoe""3 in graduate school. 

The Common Criterion Validity Study t 
Discussions with Committees 

In March of 1971 the authors sent a memorandum to the GRE Advanced Test 
Development Specialists explaining the idea of a common criterion validity study 
and asking for their advice and suggestions with regard to the feasibility of 
conducting such a study in the field represented by the Committee (s) with whom 
they worked. A copy of this memorandum is attached as Appendix A. A number of 
these specialists responded in writing and many others discussed their reaction 
with the investigators. The investigators then followed up these responses with 
telephone conversations with most of the specialists. The specialists not con- 
tacted further were those whose field had a very small volume of candidates or 
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for which there were obvious problems in designing an adequate criterion. 
Finally, based on the information received, an attempt was made to obtain time 
, on the agenda of a regularly scheduled meeting of several' of the Committees. 

Discussions were held with three Committees! Philosophy^ Wench, and Literature 
in English. 

Results of Discussions with Committees 

Discussions with the Committees followed a standard format. First, one 
of the investigators set forth briefly to the Committee the central concepts 
involved in the conduct of a validity study and the major problems associated 
with conducting such studies at the graduate level. The kind of study being 
suggested, with some reference to the LSAT Criterion Study (Klein & Hart, 1968; 
Linn, ELein, & Hart, 1972) > was then explained. Typically, some discussion of 
the general criterion problem followed. The bulk of the remaining discussion 
focused on appropriate criteria for the field under consideration. A synopsis 
of this final part of the discussions and any subsequent developments follows. 

Philosophy . The GEE Committee of Examiners in Philosophy expressed great 
interest in the possibility of conducting a study and discussed possible 
criteria and feasibility questions. They felt that it would be quite feasible 
for a number of departments to agree on one or more questions to be included in 
pre-lim examinations; however, they were convinced that grades on these ques- 
tions would not constitute an adequate criterion for validity studies. They 
concluded that they could not come up with a task or set of tasks for graduate 
students which they would find to be an acceptable criterion. 

Conversation then turned to a discussion of the use of rating scales in 
some of the Committee members* departments. The outcome was that the Committee 
felt that rating scales offered real possibilities and suggested that this be 
pursued. 

o 
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French * The ORB: Committee of Examiners in French displayed a keen interest 
in the possibility and discussed possible types of criteria. Their final choice 
was a literary analysis at the Masters degree level t They felt that almost all 
graduate schools offer some form of the "explication de texte fe at the M.A. * 
examinations, although the style may differ- -varying from a free essay of a 
couple of paragraphs, to several pages, to a finely structured analysis con- 
trolled by precise and graded questions. They concurred that a structured 
"explication" Vould be the best form to use. 

One committee member agreed to act as liaison and during the summer of 19J1 
wrote to the chairmen of several French departments soliciting their departments 1 
cooperation in a research study. In general, the chairmen expressed interest in 
such a study but at the same time declined to cooperate. Their reasons usually 
concerned the operational problems that such a study would give rise to at their 
institution. The project was then brought to the attention of a group of "Big 
Ten" foreign language department chairmen with similar results . As a conse- 
quence there seemed to be littl^ hope of conducting such a common criterion 
validity study for graduate study in French. 

Literature in English . The GRE Committee of Examiners in Literature and 
English felt that there was not an "essay type criterion" which could be applied 
at the graduate level. They would always be interested in the relationship 
between GRE scores and an essay examination but did not feel this was an ade- 
quate criterion for graduate study in their field. 

According to the Committee graduate training in English primarily prepares 
people for teaching positions, thus perhaps the best criterion would be the 
kjtainment of tenure in a "good" department. The Committee expressed interest 
in pursuing tenure attainment as a criterion even though they recognized that 
it was distal in nature and that the GRE tests, particularly the Advanced Test 



in English Literature, were not designed to predict such a criterion. They 
felt that a list of the top 100 departments could be compiled fairly easily 
and with relatively general agreement. This could be done either by a group 
established for that purpose or by determining the amount of federal funding • 
received. They felt that several such schemes could be worked out which would 
result in essentially the same list. Additional technical problems in design- 
ing the criterion scale were not discussed, and this idea has not been pursued. 

Conclusions of Discussions and Plans for Subsequent Researc h 

After discussion a common criterion validity study involving an essay-type 
measure with the GEE Committees of Examiners in three graduate fields, it 
became evident that problems of such a study were insurmountable, and the pro- 
cedure was rejected. However, in the course of the discussion with the philos- 
ophy Committee it was noted that rating scales were used by a number of graduate 
departments to classify graduate students according to their developed or 
probable potential in the field; conversations with individual Committee members 
indicated that this was the case in other fields as well. Thus, it was decided 
to investigate the extent and uses of rating scales by graduate departments. 

The Common Criterion Validity Study: 
Survey of Rating Procedures in Use 

To assess the extent to which graduate departments were currently using some 
rating (or ranking) procedures, questionnaires were mailed to a sample of depart- 
ments in five areas of graduate study. The questionnaire (see Appendix b) 
solicited general information on who was involved in formulating the ratings, 
what attributes were taken into consideration, and at what point in the students* 
careers the ratings were made. 



rhe departments represented in the study were biology, English, history, 
nathematics, and psychology • The sample of departments in each of these fields 
was drawn from tables in Students Enrolled for Advanced Degrees 3 Fall 1969 
(1970) which reported totals of first year graduate students by department • 
The criterion for selection was set at a total of 25 (or more) first-year 
graduate students. The sajnple consisted of every other department of that size 
listed in the table. 

Results 

Of the h?l departments contacted approximately 75$ responded to the 
questionnaire after one follow-up. No further attempt was made to collect any 
data on the remaining' departments. The number of questionnaires mailed, the 
number of responses, and the number and percentage of respondents who indicated 
that some form of ratings (or ranging) were currently used by their department 
are presented by field of study in Table 1. More than 50% of the respondent 

Table 1 

Response to the Questionnaire 



Department 


No. Quest* 
Mailed 


No. Responding to 
Questionnaire 


No. Using Eatings 
and Rankings 


<jo Using Eatings 
and Bankings 


Biology 


h& 


37 


5 


Ik 


English 


12h 


100 




1*5 


History- 


90 




38 


60 


Mat hematics 


83 


66 


21 


32 


Psychology 


76 


53 


28 


53 


TOTAL 


421 


319 


137 





departments of history and psychology employed some method of rating or ranking 
graduate students. Across the five fields, k$i of the departments responding 
reported they used some form of rating or ranking. 

The responses indicated some confusion about the definition of "rating." 
Some department chairmen indicated that ratings were used but proceeded to 
describe the process as an evaluation by a faculty committee which resulted in 
a pass-fail recommendation. Since the purpose of this study"was to investigate 
systematic rating procedures, a department was classified as using ratings if 
it specified at least a three-level scale (e.g., unacceptable, acceptable, 
excellent). 

From the departmental responses the following categories of ratings or 
rankings were tabulated: (a) general evaluation at end of first year* 
(b) general evaluation to determine who is to be allowed to continue in the 
program or be recommended for continuing work elsewhere, typically in the second 
year or later; (c) evaluation to determine who will receive financial aid; 
(d) evaluation of the Master* s examination or thesis; and (e) evaluation of 
preliminary examinations, oral examinations, or dissertation for the ph.D. 
These tabulations are presented in Table 2. In addition to these major cate- 
gories a small number of departments indicated the use of ratings at the con- 
clusion of each course, as an annual review, and for such purposes as selecting 
teaching or research assistants. In summary, of the departments reporting that 
ratings were used, the majority of each of the five fields indicated that the 
ratings occurred at the Masters or Ph.D. examination time. However, a number 
of departments in each field reported the use of general evaluative ratings 
earlier in the students' course of study. 



-8 



Table 2 

Percentage of Ratings Falling into Categories by Field of 
Study (Absolute Numbers Given in Brackets ) a 





General Evaluation 


Financial 
Aid 


. Specific Products 


Department 


End of 
First Year 


Continuing 
Work 


End of Each 
Year 


Master's 


Ph.D. 


Biology (37) 
English (100) 
History (63) 
Mathematics (66) 
Psychology (53) 


" 3*(1) 
3*(3) 
11#(7 ) 

WO 


3*(1) 
8#(8) 
10#(6) 
3#(2) 
2^(1) 


wo 

8#(5) 
3#(2) 
o%(0) 


5#(S) 
20#(20) 

27*07) 
11^(7 ) 

25*03) 


3*0) 
21#(21) 

33^(21) 

15^(10) 

32^(17) 



A given rating procedure may fall into more than one category. 



Conclusions 

Although great interest in the possibility of conducting validity studies 
using a common criterion task was expressed by members of the staff of test 
development and by members of several CUE Committees of Examiners, each of the 
committees contacted concluded that such a study was not feasible. However some 
committee members suggested that many graduate departments used some type of 
ratings of graduate students; that rating scale criteria would be generally 
acceptable to the various disciplines; and that it would be feasible to conduct 
studies using this type of criterion. 

It appeared from the survey that a sufficient number of departments use a 
three-or-greater level rating procedure to warrant an attempt to conduct some 
preliminary validity studies using existing rating data as criteria. The 
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probable variation between the rating scales currently in use in departments at 
different universities, both in terms of attributes rated and type of scale . 
quality, suggests that a uniform set of criterion rating scales should be 
developed prior to attempting to conduct validity studies using rating scales 
as criterion measures. 
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APPENDIX A 



Memorandum fori GRE ADVANCED TEST DEVELOPMENT SPECIALISTS 

cc: Mrs. Conrad 
Mr. Daves 
Mr. D onion 
Miss Lear 
Mr. McPeek 

Subject: common Criterion Validity Date: March 5, 1971 

Study (5^0.72) From: Alfred B« Carlson 

Franklin R. Evans 

As you may know the Research Committee of the GREB has been concerned for 
several years with the paucity of validity data for the GRE. In the main, two 
problems have brought about this situation. The first is the small (for statisti- 
cal purposes) number of studcits admitted to graduate study by a single department 
within a university in a given year or even over the period of two or three years* 
Ideally at least 100 students must be admitted within a one or two year period for 
a meaningful study to be conducted. 

The second problem looms even larger in the minds of many graduate deans* This 
is the criterion problem. Although grade-point average has long served as the 
natural and effective performance criterion at the college level, the same measure 
when viewed in the graduate context, appears if not inappropriate, certainly inade- 
quate. Other criteria which have been developed in an attempt to overcome some of 
the limitations of grades, such as global ratings or attainment of the doctorate, 
while offering some advantages, fail to reflect important aspects of performance in 
the graduate context. 



The attached document is a short proposal for a feasibility study which we sub- 
mitted to the Research Committee recently in a package of several istudies directed 
toward the criterion problem. The study has been funded. We feel that the procedure 
we are suggesting will allow us to circumvent some of the problems with more tradi- 
tional criteria in some fields and at the same time to construct a criterion which 
will be particularly appropriate for examining the predictive validity of some of the 
GRE Advanced tests. (We recognise that the Advanced tests may be used for purposes 
other than those suggested by the "prediction paradigm." Nevertheless, the extent to 
which scores on those testis do relate to indices of achievement in graduate school is, 
we feel, an important question. ) 

It is now that we turn to you for your advice and suggestions with regard to 
the feasibility of such an enterprise in the field represented by the Committee with 
which you work. Please look over the attached material and give some thought to the 
possibility of such a study being conducted in your field. If you have any questions 
please call one of us. The fact that the Committee may not meet this Spring is prob- 
ably not a serious problem. If you feel that your field might be a good possibility, 
even if you see some serious problems (including a serious overcommitment on the part 
of your committee) please let us know so that we can explore it with you further. 
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